Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Med ; 16(1): 12, 2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38217035

RESUMO

Optimal integration of transcriptomics data and associated spatial information is essential towards fully exploiting spatial transcriptomics to dissect tissue heterogeneity and map out inter-cellular communications. We present SEDR, which uses a deep autoencoder coupled with a masked self-supervised learning mechanism to construct a low-dimensional latent representation of gene expression, which is then simultaneously embedded with the corresponding spatial information through a variational graph autoencoder. SEDR achieved higher clustering performance on manually annotated 10 × Visium datasets and better scalability on high-resolution spatial transcriptomics datasets than existing methods. Additionally, we show SEDR's ability to impute and denoise gene expression (URL: https://github.com/JinmiaoChenLab/SEDR/ ).


Assuntos
Comunicação Celular , Perfilação da Expressão Gênica , Humanos , Análise por Conglomerados
2.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37544658

RESUMO

MOTIVATION: Recent advances in spatially resolved transcriptomics (ST) technologies enable the measurement of gene expression profiles while preserving cellular spatial context. Linking gene expression of cells with their spatial distribution is essential for better understanding of tissue microenvironment and biological progress. However, effectively combining gene expression data with spatial information to identify spatial domains remains challenging. RESULTS: To deal with the above issue, in this paper, we propose a novel unsupervised learning framework named STMGCN for identifying spatial domains using multi-view graph convolution networks (MGCNs). Specifically, to fully exploit spatial information, we first construct multiple neighbor graphs (views) with different similarity measures based on the spatial coordinates. Then, STMGCN learns multiple view-specific embeddings by combining gene expressions with each neighbor graph through graph convolution networks. Finally, to capture the importance of different graphs, we further introduce an attention mechanism to adaptively fuse view-specific embeddings and thus derive the final spot embedding. STMGCN allows for the effective utilization of spatial context to enhance the expressive power of the latent embeddings with multiple graph convolutions. We apply STMGCN on two simulation datasets and five real spatial transcriptomics datasets with different resolutions across distinct platforms. The experimental results demonstrate that STMGCN obtains competitive results in spatial domain identification compared with five state-of-the-art methods, including spatial and non-spatial alternatives. Besides, STMGCN can detect spatially variable genes with enriched expression patterns in the identified domains. Overall, STMGCN is a powerful and efficient computational framework for identifying spatial domains in spatial transcriptomics data.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Simulação por Computador
3.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37572298

RESUMO

MOTIVATION: Metabolic stability plays a crucial role in the early stages of drug discovery and development. Accurately modeling and predicting molecular metabolic stability has great potential for the efficient screening of drug candidates as well as the optimization of lead compounds. Considering wet-lab experiment is time-consuming, laborious, and expensive, in silico prediction of metabolic stability is an alternative choice. However, few computational methods have been developed to address this task. In addition, it remains a significant challenge to explain key functional groups determining metabolic stability. RESULTS: To address these issues, we develop a novel cross-modality graph contrastive learning model named CMMS-GCL for predicting the metabolic stability of drug candidates. In our framework, we design deep learning methods to extract features for molecules from two modality data, i.e. SMILES sequence and molecule graph. In particular, for the sequence data, we design a multihead attention BiGRU-based encoder to preserve the context of symbols to learn sequence representations of molecules. For the graph data, we propose a graph contrastive learning-based encoder to learn structure representations by effectively capturing the consistencies between local and global structures. We further exploit fully connected neural networks to combine the sequence and structure representations for model training. Extensive experimental results on two datasets demonstrate that our CMMS-GCL consistently outperforms seven state-of-the-art methods. Furthermore, a collection of case studies on sequence data and statistical analyses of the graph structure module strengthens the validation of the interpretability of crucial functional groups recognized by CMMS-GCL. Overall, CMMS-GCL can serve as an effective and interpretable tool for predicting metabolic stability, identifying critical functional groups, and thus facilitating the drug discovery process and lead compound optimization. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are freely available at https://github.com/dubingxue/CMMS-GCL.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Projetos de Pesquisa
4.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37466210

RESUMO

MOTIVATION: Recent advances in spatial transcriptomics technologies have enabled gene expression profiles while preserving spatial context. Accurately identifying spatial domains is crucial for downstream analysis and it requires the effective integration of gene expression profiles and spatial information. While increasingly computational methods have been developed for spatial domain detection, most of them cannot adaptively learn the complex relationship between gene expression and spatial information, leading to sub-optimal performance. RESULTS: To overcome these challenges, we propose a novel deep learning method named Spatial-MGCN for identifying spatial domains, which is a Multi-view Graph Convolutional Network (GCN) with attention mechanism. We first construct two neighbor graphs using gene expression profiles and spatial information, respectively. Then, a multi-view GCN encoder is designed to extract unique embeddings from both the feature and spatial graphs, as well as their shared embeddings by combining both graphs. Finally, a zero-inflated negative binomial decoder is used to reconstruct the original expression matrix by capturing the global probability distribution of gene expression profiles. Moreover, Spatial-MGCN incorporates a spatial regularization constraint into the features learning to preserve spatial neighbor information in an end-to-end manner. The experimental results show that Spatial-MGCN outperforms state-of-the-art methods consistently in several tasks, including spatial clustering and trajectory inference.


Assuntos
Oftalmopatias Hereditárias , Doenças Genéticas Ligadas ao Cromossomo X , Humanos , Perfilação da Expressão Gênica
5.
Nat Commun ; 14(1): 1155, 2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36859400

RESUMO

Spatial transcriptomics technologies generate gene expression profiles with spatial context, requiring spatially informed analysis tools for three key tasks, spatial clustering, multisample integration, and cell-type deconvolution. We present GraphST, a graph self-supervised contrastive learning method that fully exploits spatial transcriptomics data to outperform existing methods. It combines graph neural networks with self-supervised contrastive learning to learn informative and discriminative spot representations by minimizing the embedding distance between spatially adjacent spots and vice versa. We demonstrated GraphST on multiple tissue types and technology platforms. GraphST achieved 10% higher clustering accuracy and better delineated fine-grained tissue structures in brain and embryo tissues. GraphST is also the only method that can jointly analyze multiple tissue slices in vertical or horizontal integration while correcting batch effects. Lastly, GraphST demonstrated superior cell-type deconvolution to capture spatial niches like lymph node germinal centers and exhausted tumor infiltrating T cells in breast tumor tissue.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Encéfalo , Análise por Conglomerados , Centro Germinativo
6.
Bioinformatics ; 38(8): 2254-2262, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35171981

RESUMO

MOTIVATION: Graphs or networks are widely utilized to model the interactions between different entities (e.g. proteins, drugs, etc.) for biomedical applications. Predicting potential interactions/links in biomedical networks is important for understanding the pathological mechanisms of various complex human diseases, as well as screening compound targets for drug discovery. Graph neural networks (GNNs) have been utilized for link prediction in various biomedical networks, which rely on the node features extracted from different data sources, e.g. sequence, structure and network data. However, it is challenging to effectively integrate these data sources and automatically extract features for different link prediction tasks. RESULTS: In this article, we propose a novel Pre-Training Graph Neural Networks-based framework named PT-GNN to integrate different data sources for link prediction in biomedical networks. First, we design expressive deep learning methods [e.g. convolutional neural network and graph convolutional network (GCN)] to learn features for individual nodes from sequence and structure data. Second, we further propose a GCN-based encoder to effectively refine the node features by modelling the dependencies among nodes in the network. Third, the node features are pre-trained based on graph reconstruction tasks. The pre-trained features can be used for model initialization in downstream tasks. Extensive experiments have been conducted on two critical link prediction tasks, i.e. synthetic lethality (SL) prediction and drug-target interaction (DTI) prediction. Experimental results demonstrate PT-GNN outperforms the state-of-the-art methods for SL prediction and DTI prediction. In addition, the pre-trained features benefit improving the performance and reduce the training time of existing models. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at: https://github.com/longyahui/PT-GNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Humanos , Desenvolvimento de Medicamentos , Proteínas
7.
Methods ; 198: 11-18, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34419588

RESUMO

Coronavirus Disease-19 (COVID-19) has lead global epidemics with high morbidity and mortality. However, there are currently no proven effective drugs targeting COVID-19. Identifying drug-virus associations can not only provide insights into the understanding of drug-virus interaction mechanism, but also guide and facilitate the screening of compound candidates for antiviral drug discovery. Since conventional experiment methods are time-consuming, laborious and expensive, computational methods to identify potential drug candidates for viruses (e.g., COVID-19) provide an alternative strategy. In this work, we propose a novel framework of Heterogeneous Graph Attention Networks for Drug-Virus Association predictions, named HGATDVA. First, we fully incorporate multiple sources of biomedical data, e.g., drug chemical information, virus genome sequences and viral protein sequences, to construct abundant features for drugs and viruses. Second, we construct two drug-virus heterogeneous graphs. For each graph, we design a self-enhanced graph attention network (SGAT) to explicitly model the dependency between a node and its local neighbors and derive the graph-specific representations for nodes. Third, we further develop a neural network architecture with tri-aggregator to aggregate the graph-specific representations to generate the final node representations. Extensive experiments were conducted on two datasets, i.e., DrugVirus and MDAD, and the results demonstrated that our model outperformed 7 state-of-the-art methods. Case study on SARS-CoV-2 validated the effectiveness of our model in identifying potential drugs for viruses.


Assuntos
COVID-19 , Preparações Farmacêuticas , Interações Medicamentosas , Humanos , Redes Neurais de Computação , SARS-CoV-2
8.
BMC Bioinformatics ; 22(1): 609, 2021 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-34930120

RESUMO

BACKGROUND: Long non-coding RNAs (lncRNAs) play significant roles in varieties of physiological and pathological processes.The premise of the lncRNA functional study is that the lncRNAs are identified correctly. Recently, deep learning method like convolutional neural network (CNN) has been successfully applied to identify the lncRNAs. However, the traditional CNN considers little relationships among samples via an indirect way. RESULTS: Inspired by the Siamese Neural Network (SNN), here we propose a novel network named Class Similarity Network in coding RNA and lncRNA classification. Class Similarity Network considers more relationships among input samples in a direct way. It focuses on exploring the potential relationships between input samples and samples from both the same class and the different classes. To achieve this, Class Similarity Network trains the parameters specific to each class to obtain the high-level features and represents the general similarity to each class in a node. The comparison results on the validation dataset under the same conditions illustrate the superiority of our Class Similarity Network to the baseline CNN. Besides, our method performs effectively and achieves state-of-the-art performances on two test datasets. CONCLUSIONS: We construct Class Similarity Network in coding RNA and lncRNA classification, which is shown to work effectively on two different datasets by achieving accuracy, precision, and F1-score as 98.43%, 0.9247, 0.9374, and 97.54%, 0.9990, 0.9860, respectively.


Assuntos
RNA Longo não Codificante , Redes Neurais de Computação , RNA Longo não Codificante/genética
9.
Bioinformatics ; 37(16): 2432-2440, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33609108

RESUMO

MOTIVATION: Synthetic Lethality (SL) plays an increasingly critical role in the targeted anticancer therapeutics. In addition, identifying SL interactions can create opportunities to selectively kill cancer cells without harming normal cells. Given the high cost of wet-lab experiments, in silico prediction of SL interactions as an alternative can be a rapid and cost-effective way to guide the experimental screening of candidate SL pairs. Several matrix factorization-based methods have recently been proposed for human SL prediction. However, they are limited in capturing the dependencies of neighbors. In addition, it is also highly challenging to make accurate predictions for new genes without any known SL partners. RESULTS: In this work, we propose a novel graph contextualized attention network named GCATSL to learn gene representations for SL prediction. First, we leverage different data sources to construct multiple feature graphs for genes, which serve as the feature inputs for our GCATSL method. Second, for each feature graph, we design node-level attention mechanism to effectively capture the importance of local and global neighbors and learn local and global representations for the nodes, respectively. We further exploit multi-layer perceptron (MLP) to aggregate the original features with the local and global representations and then derive the feature-specific representations. Third, to derive the final representations, we design feature-level attention to integrate feature-specific representations by taking the importance of different feature graphs into account. Extensive experimental results on three datasets under different settings demonstrated that our GCATSL model outperforms 14 state-of-the-art methods consistently. In addition, case studies further validated the effectiveness of our proposed model in identifying novel SL pairs. AVAILABILITYAND IMPLEMENTATION: Python codes and dataset are freely available on GitHub (https://github.com/longyahui/GCATSL) and Zenodo (https://zenodo.org/record/4522679) under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

10.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32725163

RESUMO

MOTIVATION: human microbes play a critical role in an extensive range of complex human diseases and become a new target in precision medicine. In silico methods of identifying microbe-disease associations not only can provide a deep insight into understanding the pathogenic mechanism of complex human diseases but also assist pharmacologists to screen candidate targets for drug development. However, the majority of existing approaches are based on linear models or label propagation, which suffers from limitations in capturing nonlinear associations between microbes and diseases. Besides, it is still a great challenge for most previous methods to make predictions for new diseases (or new microbes) with few or without any observed associations. RESULTS: in this work, we construct features for microbes and diseases by fully exploiting multiply sources of biomedical data, and then propose a novel deep learning framework of graph attention networks with inductive matrix completion for human microbe-disease association prediction, named GATMDA. To our knowledge, this is the first attempt to leverage graph attention networks for this important task. In particular, we develop an optimized graph attention network with talking-heads to learn representations for nodes (i.e. microbes and diseases). To focus on more important neighbours and filter out noises, we further design a bi-interaction aggregator to enforce representation aggregation of similar neighbours. In addition, we combine inductive matrix completion to reconstruct microbe-disease associations to capture the complicated associations between diseases and microbes. Comprehensive experiments on two data sets (i.e. HMDAD and Disbiome) demonstrated that our proposed model consistently outperformed baseline methods. Case studies on two diseases, i.e. asthma and inflammatory bowel disease, further confirmed the effectiveness of our proposed model of GATMDA. AVAILABILITY: python codes and data set are available at: https://github.com/yahuilong/GATMDA. CONTACT: luojiawei@hnu.edu.cn.


Assuntos
Algoritmos , Asma/microbiologia , Biologia Computacional , Doenças Inflamatórias Intestinais/microbiologia , Microbiota , Modelos Biológicos , Asma/tratamento farmacológico , Desenvolvimento de Medicamentos , Humanos , Doenças Inflamatórias Intestinais/tratamento farmacológico
11.
IEEE J Biomed Health Inform ; 25(1): 266-275, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32750918

RESUMO

Accurately identifying microbe-drug associations plays a critical role in drug development and precision medicine. Considering that the conventional wet-lab method is time-consuming, labor-intensive and expensive, computational approach is an alternative choice. The increasing availability of numerous biological data provides a great opportunity to systematically understand complex interaction mechanisms between microbes and drugs. However, few computational methods have been developed for microbe drug prediction. In this work, we leverage multiple sources of biomedical data to construct a heterogeneous network for microbes and drugs, including drug-drug interactions, microbe-microbe interactions and microbe-drug associations. And then we propose a novel Heterogeneous Network Embedding Representation framework for Microbe-Drug Association prediction, named (HNERMDA), by combining metapath2vec with bipartite network recommendation. In this framework, we introduce metapath2vec, a heterogeneous network representation learning method, to learn low-dimensional embedding representations for microbes and drugs. Following that, we further design a bias bipartite network projection recommendation algorithm to improve prediction accuracy. Comprehensive experiments on two datasets, named MDAD and aBiofilm, demonstrated that our model consistently outperformed five baseline methods in three types of cross-validations. Case study on two popular drugs (i.e., Ciprofloxacin and Pefloxacin) further validated the effectiveness of our HNERMDA model in inferring potential target microbes for drugs.


Assuntos
Algoritmos , Interações Medicamentosas , Humanos
12.
Bioinformatics ; 36(Suppl_2): i779-i786, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381844

RESUMO

MOTIVATION: Human microbes get closely involved in an extensive variety of complex human diseases and become new drug targets. In silico methods for identifying potential microbe-drug associations provide an effective complement to conventional experimental methods, which can not only benefit screening candidate compounds for drug development but also facilitate novel knowledge discovery for understanding microbe-drug interaction mechanisms. On the other hand, the recent increased availability of accumulated biomedical data for microbes and drugs provides a great opportunity for a machine learning approach to predict microbe-drug associations. We are thus highly motivated to integrate these data sources to improve prediction accuracy. In addition, it is extremely challenging to predict interactions for new drugs or new microbes, which have no existing microbe-drug associations. RESULTS: In this work, we leverage various sources of biomedical information and construct multiple networks (graphs) for microbes and drugs. Then, we develop a novel ensemble framework of graph attention networks with a hierarchical attention mechanism for microbe-drug association prediction from the constructed multiple microbe-drug graphs, denoted as EGATMDA. In particular, for each input graph, we design a graph convolutional network with node-level attention to learn embeddings for nodes (i.e. microbes and drugs). To effectively aggregate node embeddings from multiple input graphs, we implement graph-level attention to learn the importance of different input graphs. Experimental results under different cross-validation settings (e.g. the setting for predicting associations for new drugs) showed that our proposed method outperformed seven state-of-the-art methods. Case studies on predicted microbe-drug associations further demonstrated the effectiveness of our proposed EGATMDA method. AVAILABILITY: Source codes and supplementary materials are available at: https://github.com/longyahui/EGATMDA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Preparações Farmacêuticas , Software , Simulação por Computador , Interações Medicamentosas , Humanos , Aprendizado de Máquina
13.
BMC Bioinformatics ; 21(1): 522, 2020 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-33183242

RESUMO

BACKGROUND: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. RESULTS: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. CONCLUSIONS: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.


Assuntos
DNA/metabolismo , Aprendizado Profundo , Área Sob a Curva , DNA/genética , Modelos Teóricos , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Curva ROC , Interface Usuário-Computador
14.
Bioinformatics ; 36(19): 4918-4927, 2020 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-32597948

RESUMO

MOTIVATION: Human microbes play critical roles in drug development and precision medicine. How to systematically understand the complex interaction mechanism between human microbes and drugs remains a challenge nowadays. Identifying microbe-drug associations can not only provide great insights into understanding the mechanism, but also boost the development of drug discovery and repurposing. Considering the high cost and risk of biological experiments, the computational approach is an alternative choice. However, at present, few computational approaches have been developed to tackle this task. RESULTS: In this work, we leveraged rich biological information to construct a heterogeneous network for drugs and microbes, including a microbe similarity network, a drug similarity network and a microbe-drug interaction network. We then proposed a novel graph convolutional network (GCN)-based framework for predicting human Microbe-Drug Associations, named GCNMDA. In the hidden layer of GCN, we further exploited the Conditional Random Field (CRF), which can ensure that similar nodes (i.e. microbes or drugs) have similar representations. To more accurately aggregate representations of neighborhoods, an attention mechanism was designed in the CRF layer. Moreover, we performed a random walk with restart-based scheme on both drug and microbe similarity networks to learn valuable features for drugs and microbes, respectively. Experimental results on three different datasets showed that our GCNMDA model consistently achieved better performance than seven state-of-the-art methods. Case studies for three microbes including SARS-CoV-2 and two antimicrobial drugs (i.e. Ciprofloxacin and Moxifloxacin) further confirmed the effectiveness of GCNMDA in identifying potential microbe-drug associations. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at: https://github.com/longyahui/GCNMDA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Microbiota , Preparações Farmacêuticas , Algoritmos , Biologia Computacional , Humanos , Pandemias , SARS-CoV-2
15.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1341-1351, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30489271

RESUMO

Accumulating clinic evidences have demonstrated that the microbes residing in human bodies play a significantly important role in the formation, development, and progression of various complex human diseases. Identifying latent related microbes for disease could provide insight into human disease mechanisms and promote disease prevention, diagnosis, and treatment. In this paper, we first construct a heterogeneous network by connecting the disease similarity network and the microbe similarity network through known microbe-disease association network, and then develop a novel computational model to predict human microbe-disease associations based on random walk by integrating network topological similarity (NTSHMDA). Specifically, each microbe-disease association pair is regarded as a distinct relationship level and, thus, assigned different weights based on network topological similarity. The experimental results show that NTSHMDA outperforms some state-of-the-art methods with average AUCs of 0.9070, 0.8896 ± 0.0038 in the frameworks of Leave-one-out cross validation and 5-fold cross validation, respectively. In case studies, 9, 18, 38 and 9, 18, 45 out of top-10, 20, 50 candidate microbes are verified by recently published literatures for asthma and inflammatory bowel disease, respectively. In conclusion, NTSHMDA has potential ability to identify novel disease-microbe associations and can also provide valuable information for drug discovery and biological researches.


Assuntos
Algoritmos , Biologia Computacional/métodos , Doença , Interações Hospedeiro-Patógeno , Microbiota , Asma/microbiologia , Humanos , Modelos Estatísticos
16.
BMC Bioinformatics ; 20(1): 541, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-31675979

RESUMO

BACKGROUND: An increasing number of biological and clinical evidences have indicated that the microorganisms significantly get involved in the pathological mechanism of extensive varieties of complex human diseases. Inferring potential related microbes for diseases can not only promote disease prevention, diagnosis and treatment, but also provide valuable information for drug development. Considering that experimental methods are expensive and time-consuming, developing computational methods is an alternative choice. However, most of existing methods are biased towards well-characterized diseases and microbes. Furthermore, existing computational methods are limited in predicting potential microbes for new diseases. RESULTS: Here, we developed a novel computational model to predict potential human microbe-disease associations (MDAs) based on Weighted Meta-Graph (WMGHMDA). We first constructed a heterogeneous information network (HIN) by combining the integrated microbe similarity network, the integrated disease similarity network and the known microbe-disease bipartite network. And then, we implemented iteratively pre-designed Weighted Meta-Graph search algorithm on the HIN to uncover possible microbe-disease pairs by cumulating the contribution values of weighted meta-graphs to the pairs as their probability scores. Depending on contribution potential, we described the contribution degree of different types of meta-graphs to a microbe-disease pair with bias rating. Meta-graph with higher bias rating will be assigned greater weight value when calculating probability scores. CONCLUSIONS: The experimental results showed that WMGHMDA outperformed some state-of-the-art methods with average AUCs of 0.9288, 0.9068 ±0.0031 in global leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. In the case studies, 9, 19, 37 and 10, 20, 45 out of top-10, 20, 50 candidate microbes were manually verified by previous reports for asthma and inflammatory bowel disease (IBD), respectively. Furthermore, three common human diseases (Crohn's disease, Liver cirrhosis, Type 1 diabetes) were adopted to demonstrate that WMGHMDA could be efficiently applied to make predictions for new diseases. In summary, WMGHMDA has a high potential in predicting microbe-disease associations.


Assuntos
Bactérias/isolamento & purificação , Infecções Bacterianas/microbiologia , Biologia Computacional/métodos , Algoritmos , Bactérias/química , Fenômenos Fisiológicos Bacterianos , Simulação por Computador , Humanos , Serviços de Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...